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(57) Abstract 

Two video signals, typically an original signal (16) and a degraded version (16d) of the same signal, are analysed firstly to identify 
the perceptually relevant boundaries of the elements forming the video images depicted therein (31). These boundaries are then compared 
(33) to determine the extent to which the properties of the boundaries defined in one image (16) are preserved in the other (16d), to generate 
an output (38) indicative of the perceptual difference between the first and second signals. The boundaries may be defined by edges, 
colour, luminance or texture contrasts, disparities between frames in a moving or steroscopic image, or other means. The presence, absence, 
difference in clarity or difference in means of definition of the boundaries is indicative of the perceptual importance of the differences 
between the signals, and therefore of the extent to which any degradation of the signal (16d) will be perceived by the human viewer of 
the resulting degraded image. The results may also be weighted (36) according to the perceptual importance of the image depicted - for 
example the features which identify a human face, and in particular those responsible for visual speech cues. 
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ANALYSIS OF VIDEO SIGNAL QUALITY 

This invention relates to the analysis of the quality of video signals. It has a 
number of applications in monitoring the performance of video transmission 
5 equipment, either during development, under construction, or in service. 

As communications systems have increased in complexity it has become 
increasingly difficult to measure their performance objectively. Modern 
communications links frequently use data compression techniques to reduce the 
bandwidth required for transmission. When signals are compressed for more efficient 
10 transmission, conventional engineering metrics, such as signal-to-noise ratio or bit 
error rate, are unreliable indicators of the performance experienced by the human 
being who ultimately receives the signal. For example, two systems having similar bit- 
error rates may have markedly different effects on the quality of the data (sound or 
picture) presented to the end user, depending on which digital bits are lost. Other 
15 non-linear processes such as echo cancellation are also becoming increasingly 
common. The complexity of modern communications systems makes them 
unsuitable for analysis using conventional signal processing techniques. End-to-end 
assessment of network quality must be based on what the customer has, or would 
have, heard or seen. 

20 The main benchmarks of y.iewer opinion are the subjective tests carried out 

; to International Telecommunications Union standards P. 300, "Methods for subjective 
determination of transmission quality", 1996 and P. 91 1 "Subjective audiovisual 
quality assessment methods for multimedia applications", 7998. These measure 
perceived quality in controlled subjective experiments, in which several human 

25 subjects; Jistenuto each signal under test. This is impractical for- use in the continuous 
monitoring of a network, and also compromises the privacy of the parties to the calls 
being monitored. To overcome these problems, auditory perceptual models such as 
those ;of the present applicant's .international Patent Specifications WO 94/00922, 
W095/010H A WO95/15035, ; ,W097/05730, W097/32428, W098/53589 . and 

30 WO98/53590 are being developed for measuring- telephone network quality. These 
are objective performance metrics, but are designed to relate directly to perceived 
signal quality, by producing quality scorings similar to those which would have been 
reported by human subjects. 
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The prior art systems referred to. above measure the quality of sound (audio) 
signals. The present invention is concerned with the application of similar principles to 
video signals. The basic principle, of emulating the human perceptual-system (in this 
case the eye/brain system instead of the ear/brain system) is stillused, but video 
5 signals and the human visual perceptual system are both much more complex, and 
raise new problems. 

As with hearing, the human visual perception system has physiological 
properties that make some features present in visual stimuli, very difficult or 
impossible to perceive. Compression processes, such as those .est a Wished,. by JPEG 

10 (Joint Pictures Expert Group) and. MPEG (Motion Pictures Expert Group), rely on these 
properties to reduce the amount of inforr^ation ; to be . transmitted in video - signals 
(moving or still). Two compression schemes may result in similar losses of 
information, but the perceived quality of a compressed version of .a given, image may 
be very different according to which scheme, was used. Jhe. quality.of =the resulting 

15 images cannot therefore be evaluated by simple comparison of the, original -and, final 
signals. The properties of human vision have to .be included in -the. assessment of 
perceived quality. . .. . ^ 

w m'V ^P™- 1 ®.™**^ information from,an image, by ; mathematical 

processing of pixel^ values. : The , pixel .intensity^evej, becomes meaningful only wljien 

20 processed by the human subject's visual knowledge of objects and shapes.,. |n r this 
invention, mathematical solutions a,re used .to . extract information resembling that 
used by the eye-brain system as closely as possible. ; , 

A number of different approaches to visual modelling haye, : been reported. 
These are specialised tp particular applications, or to particular, ^types,, of video 

25 distortion. For example, the MPEG compression system seeks to code the; differences ■ 
between successive frames,. At periods of overload, when : there T are ; many, differences 
between successive frames, this process reduces the pixel resolution, causing blocks 
of uniform colour and luminance to be produced. Karunasekera^A, S.^and Kingsbury, 
N. G., in "A distortion measure for blocking artefacts jn images based on human 

30 visual sensitivity^ IEEE Transactions on image Processing, Vol. 4, No. 6, pages 713- 
724, June 1S95, propose a mod^el which ^.especially designed to detsct "blockiness" 
of this kind. However, such blockiness does not .always signify an,error>_as the effect 
may have been introduced deliberately by the producer of the image, either for visual 
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f effect of to obliterate detail; such as the facial features of a person whose identity it 
is desired to conceal. 

: f \j if - |f -the requirements of a wide range of applications, from high definition 
television to video conferencing and virtual reality, are to be met, a more complex 
5 : architecture has to be used. 

Some existing visual models have an elementary emulation of perceptual 
: characteristics, referred to herein as a '"perceptual stage". Examples are found in the 
Karunasekera reference already discus'sed, and Lukas, X. J./ and Budrikis, Z. L., 
"Picture QuWfi'iy Prediction Based on a Visual Model", IEEE Transactions on 
10 Communications; vol. com-30, No. 7, pp. 1679-1692 July 1982, in which a simple 
perceptual stage is ' designed around the basic principle that large errors will dominate 
subjectivity: Other approaches have also been considered, such as a model of the 
' Temporal aggregation of ^errors" described by Tan, K. T., Ghanbari, M. and Pearson, D. 
*"A video 'd/stortio n ^me^e'r^ ' /nfoVm3f/b/7srec/7n/sc/7e Gesetlschaft, Picture Coding 
15 ' : Symposium} Berlin! September 1997. However, none of these approaches addresses 
: the relative* importance of alf errors present in the image. 

For the purposes of the present specification, the "colour" of a pixel is 
• "dfe'fifielf^^th'e^ proportion^ of the i prima rV colours [feci, green and blue) in the pixel: 
^The Vufinfa'ante" is the ; totaT intensity^ of the tfiree primary' colours. In "particular, 
20 - diff erent shades on a grey scale -are caUsed by variations in luminance. 

■ ' -'According to a first aspect of 'the present invention, there is provided a 

method of measuring the differences between a first video signal and a second video 
: signa1f cdmprisintj the steps of: vJt c \ " r! ■ 

analysing the information ' content of each video signal to identify the 
25 ^ ^reepf'ualiy relevant boundaries of the video images depicted therein; 

-.so; "$i comparing- the : boundaries so defined in the first signal with those in the 
; second ' sfgrial; thie' comparison^ including determination of 'the extent to which the 
properties of 'the boundaries defined in the original image are preserved, and 

gefterating-an output indicative df the perceptual difference between the first 
30 and second Signals: 1 •'*. 

^According to a second aspect of the present invention, there is provided 
apparatus -for" measuring the differences between a first video signal and a second 
-video signal; comprising: v 
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analysis means for the information content of each video signal, arranged to 
identify the perceptually relevant boundaries of the video images depicted therein; 

comparison means for comparing the boundaries so defined in the first signal 
with those in the second signal; the comparison including determination of the extent 
5 to which the properties of the boundaries defined in the original image are preserved, 
and means for generating an output indicative of the perceptual difference 
between the first and second signals. 

The boundaries between the main elements of an image may be identified by 
any measurable property used by the human perceptual .. system to distinguish 
10 between such elements. These may include, but are not limited to, colour, luminance, 
so-called "hard" edges (a narrow line of contrasting colour or luminance defining an 
outline or other boundary, such a line being identifiable i^ jmage analysis as a region 
of high spatial frequency), and others which will. be discussed later.. 

The properties of the boundaries on which the comparison is based include 
15 the, characteristics by which such boundaries are defined. In particular, il r a boundary 
is defined by a given characteristic, and that characteristic is lost jn the degraded 
degjee ; of , pe^ved^^ of the image, element is -dependant on 

how perceptually ^ignific^ ^pun^ary 5 .was. If the element (define.d.by CJ the 

: boundary can .nevertheless be identified in *he o>graded ; image - by means, of a 
20 boundary defined by another characteristic, the comparison also takes . account of 
how perceptually significant such a replacement boundary is, and how closely its 
position corresponds with the original, lost, boundary. , t 

The basis for the invention j is : that elements. present in.thejmage are not of 
equal importance. An error will be more, perceptible if it disrupts the, sfjape of. one of 
25 the essential features of the image. For example, a distortion present : pn. an . edge in 
the middle of a textured region wilLbe less perceptible than the same , error, on an 
independent e^dge. This is because an edge forming part . of a texture carries less 
information than an independent edge, as described by Ran, X.„ and,Payardin, N., "A 
Perceptually Motivated Three-Component Image Model - Part II: Application to Image 
30 Compression", IEEE Transactions on Image Processing, Vol. 4, Np..4^pp. 7 73-724, 
April 1995. If, however, a textured area defines a boundary, an.. err t or, that changes 
the properties of the texture. throughput the textured area can be, ^important as an 
error on an independent edge, if the error causes the .textured characteristics of the 
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area to be lost. The present invention examines the perceptual relevance of each 
boundary, and the extent to which this relevance is preserved. 
: ! The' process identifies the elements of greatest perceptual relevance, that is 

: the "boundaries between the principal elements of the image. Small variations in a 
5 property within the regions defined by the boundaries are of less relevance than errors 
that cause the boundary to change its shape. 

Moreover, the process allows comparison of this information independently 
" j of how the principal elements of the images are identified. The human perceptual 
rI - : system can 'distinguish different regions of an image in many different ways. For 
10 exarrVple, the absence of a "hard edge" will create a greater perceptual degradation if 
the tegions 'separated by that edge are of similar colour than it will" if they are of 
contrasting colours, since the' colour contrast will still allow a boundary to be 
perceived. The mo ro abrupt Vhe change, the greater the perceptual significance of the 
" l boundary'/ t ''°*"' - v y,J J '■" '■ • 
15" vl - ; ■' - T B y analysing the boundaries defined in the image, a number of further 
'development's become possible. 1 ' 1 
;.o . ; :scqp|^ e L boundaries can bis i used as a J f rame of ^reference, by identifying 'the 
^^-pVr'ndlp'al 1 Cements' 'in v, e'ach image and the differences in their' relative positions. iBy 
f usihg' differences in relative j6 : 6sit ion, as opposed to absolute position, perceptually 
20 ' unimportant ^differences in the images can be disregarded, as they do not affect the 
' quality of the resulting image' as perceived by the viewer'. In particular, if one image is 
offset relative to another, there 'are many differences between individual pixels-of one 
image 1 and the corresponding pixels of ' the other, but these differences are not 
perceptually 'relevant provided that the boundaries are in the' same relative positions. 
25-' ! By Tevemrig Ho the* principal boundaries of the image, rather than an absolute (pixel 
11 cb-oidinater'frarrie of reference, any such offset can be compensated for. 

V: The analysis may a^so include 7 identification of perceptually significant image 
: features, again' identified by the shapes of the^bbundaries identified rather than how 
* these boundaries are* defined'/' The output indicative of the perceptual difference 
30' between -the' first and second signals can be weighted according to the perceptual 
'-'significance of -sub h image features." Significant features would include the various 
characteristics 1 which make up a humanface, in particular those which are significant 
;i in ''pro'vibing'visuai speech cues. Such features are of particular significance to the 
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human cognitive system and so errors such as distortion, absence, presence of 
spurious elements or changes in relative position are of greater perceptual relevance 
in those features than in others. 

In an imags containing text, those features which distinguish one character 
5 of a typeface from another (for example the serif on a letter "G" which distinguishes 
it from a "C") are perceptually significant. . = 

An embodiment of the invention will now be described, by way of example 
only, with reference to the Figures, in which: ^ 

Figure 1 illustrates schematically a first, sensory emulation, stage of : the 
10 system , t 

Figure 2 illustrates the filter parameters used in the .sensory emulation stage 
Figure 3 illustrates schematically a, second, .perceptuaL stage. of the system 
Figures 4, 5, 6 and 7 illustrate four ways, in. which boundaries,: may be 

perceived. , . . . - 

15 !n this embodiment the .measurement process . comprises .two v stages, 

illustrated in Figures. 1 , and 3, respectively. The first - the sensory emulation stage - 
accounts for the physical t sensitivity , .of the human, ;Visual system to given stjm.ulj The 
second .--^the^pe.rcepjtua^ stage. <n estimates. , the subjective, -intrusion caused^ b>y .the 
■ remai nipg .vis|ble ^ errors. The^ r yarious functional, etem shown in Figures and 3 
20 may be^ em bodied as software running on a general-purpose computer. }: 

- t .r. - The sensory stage, (Figure .1 ) reproduces the -gross psychophysics .of the 
. sensory mechanisms: . % , %r ^ ,.• { r :: , .... <r . 

... . . (j) spatio-temporal sensitivity known, asthe human visual fifter, and ■ 
- . . W ma s k ing due to. spatial frequency, orientation. and temporal; frequency. 
25 Figure. t 1 , gives ^ a representation o s f : the sensory stage,: which i ; emutates_ the cl v 

physical properties of the human visual system. The same processes. ; are;:applied :to 
both the original signal and „ the degraded,. ^signal: .these *may be -carried out 
simultaneously in parallel processing ; units, or they may be performed for each signal 
in turn, using the same processing units. . . r ■-■•■<• 

30 The sensory stage identifies whether details are physically; perceptible, and 

identifies. the degree to which the visuaLsystem is sensitive. to;them. .To do so, it 
emulates the t.wo main characteristics of the visual system that. have. an influence on 
the physical perceptibility of a visual stimulus: ; , ,- . , ■ -. _ . . • ■ 
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• • sensitivity of the eye/brain system 

• masking effects - that is the variation in perceptual importance of one stimulus 

according to the presence of other stimuli. 
: Each of these characteristics has both a time and a space dimension, as will 

5 now be discussed. 

Each signal is first filtered in temporal and spatial frequency by a filter 12, to 
produce a filtered sequence. The values' used in the filter 12 are selected to emulate 
the human visual response, as already discussed iri relation to Figure 2. This filter 
allows details'that are ; not visible to a human visual (eye/brain) system to be removed, 
10 and therefore not counted as errors, while the perceptibility of details at other spatial 
and temporal frequeficies is increased by the greater sensitivity of the human sensory 
system at those frequencies. This rias'the effect of weighting the information that the 
signals co'htain according to 1 visual acuity. 

The human visual system is more sensitive to some spatial and temporal 
15 frequencies t : han others. Everyday experience teaches us that we cannot see details 
smaller than a certain size. Spatial resolution is' referred to in terms of spatial 
''''frequenbV, r wrVich is defined by counting the Humber of dycies of a sinusoidal pattern 
^'fcfesent^pSf Q%$r&d subtend e'c^ aV ; the- ey^-"" Closed sp^ceW^iries (fine^ 'details) 
^ correspond : .to ; high spatial 1 Tr6cjuehcies, v while' large rjatterns correspond to low spatial 
20 frequencies. Once this concept is introduced, 'hurnan vision can be compared to a 
filter, with peak "(mid-range) ■ 'sen's it ivify to T * spatial' 'frequencies of around 8 
cycles/degree and insensitivity to high frequencies {more than 60 cycles/degree). ' A 
similar filter characteristic can be applied ih the temporal' domain, where the eye fails 
to perceive flickering faster 1 than ab6ut 50 Hz. Trie overall filter characteristic for 
25^ both spatial 'artd- temporal frequency can be represented as a surface; as shown in " 
d-Rgufe" 2f- in ^which the axes are spatial and temporal frequency (measured in 
^cycles/degree and 'Hertz respectively). the v vertical axis is' sensitivity, with units 
normalised such that maximum sensitivity 'is equal 1 to 1 '*' '■ :: t 

The second aspect of vision to be modelled by the sensory stage is known as 
30r "masking'^ ' the 'reduced perceptibility of errors in areas of an image where there is 
greater spatial' activity "present, and the temporal* counterpart of this effect decreases 
< the visibility of -details as* the rate of movement increases. Masking can be understood 
by considering the organisation of the primary cortex, the first stage of the brain 
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responsible for visual processing. Each part of the cortex is sensitive to a certain 
region of the retina. The incoming image stream is divided. into groupings (known as 
channels) of spatial frequency, temporal frequency and orientation. The "next stage" 
of the brain processes the image stream as a set of channels, each accounting for a 
5 combination of spatial/temporal frequency and orientation in the correspondinq area 
of the retina. Once a given channel is excited, it tends to inhibit its neighbours, 
making it more difficult to detect other details that are close in proximity, spatial or 
temporal frequency, or orientation. 

Masking is a measure, of the amount of inhibition a channel causes to its 
10 neighbours. This information is obtained by studying the masking produced, by 
representative samples of channels, in terms of spatial/temporal frequency and 
orientation characteristics. For the sensory stage to simulate activity masking,, it is 
necessary to know the amount of activity ; present in each combination of spatial 
frequency and orientation within an image. This calculation can be performed using a 
15 Ga _k? r . function, a flexible form of bandpass filter, to generate respective outputs 14 
in which the content of each signal is split by spatial frequency and. .orientation. 
Typically, sixteen output channels are used for each .output signal,, comprisinq-four 
spatial orientations (vertical, horizontal, . and the _two^ diagonals)^, and :four^ spatial 
fre 9 uen cies. The resulting channels are analysed^ by a masking calculator 15. This 
20 calculator modifies each channel in accordance wjth the masking effect pf the xrtherv 
channels; for example the perceptual ^. importance of. . a low spatial-frequency 
phenomenon is reduced jf a higher frequency spatial phenomenon^ is also .present. 
Masking also occurs in the temporal sense - certain features are ,le_ss ; . noticeable, to 
the human observer if other effects occur within a short Jime of them., .. t 
25 Calibration of this model of masking requires , data . descritpjn^., how — 

spatial/temporal frequency of a given orientation decreases the, visibility of, another 
, stimulus. This information cannot be obtained as a complete .description as the 
number of combinations is very large. ^Therefore, the separate,, influence of . each 
parameter is measured. First the masking .effect .of a . background, on . a stimulus is 
30 measured according to the relative orientation between the two,., Then the effect of 
spatial and temporal frequency difference between masker and stimulus is measured. 
Finally, the two characteristics are combined by. interpolating ^between common 
measured points. 
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1 In a simple comparison between original and degraded frames, certain types 

of erfor/such as a horizontal/vertical shift, result in large amounts of error all over the 
: frame/ but would not be noticeable to a user. This problem can be addressed by 
r: ' empioying-frame realignment, as specified in the ITU-T "Draft new recommendation 
5 on multimedia dommunication delay, synchronisation, and frame rate measurement", 
COM 1 2-29-E; "December 1997. However this simple method does not fully account 
for the subjectivity of the" error, since it does not allow for other common defects 
such as degradation of elements in the compressed sequence. 

■ c - Following the sensory stage, the image is decomposed to allow calculation of 
10 error' subjectivity' by the perceptual stage (Figure 3), according to the importance of 
errors'in relation to structures within the image. If the visible error coincides with a 
critical feature of the image; such~as an edge, then it is more subjectively disturbing. 
The basic image elements',' which allow a human observer to perceive the image 
content/ can be thought" of as a set of abstracted boundaries. These boundaries can 
15 be formed 'bV'cbiour' and luminance differences, texture changes and movement as 
- well as edges, arid are identified in the decomposed image. Even some "Gestalt" 
effects 1 , which cause a" boundary to be perceived where none actually exists, cam*be 
tI,iJ ary6ritH^ica'l1y meas ^ 

1 These "boundaries are required in order to perceive image content and this is 
20 why visible errors that degrade these boundaries, for example by blurring or changing 
their "shape /'have greater subjective significance than those which do not. The output 
frbm the perceptual stage is a set of context-sensitive error descriptors that can be 
'-■ weighted differently to map to a variety of opinion'criteria. ' 

In some instances) a boundary may be completely absent, or a spurious 
25 boundary rhay be present, fd'r example when a "ghost" image is formed by muitipath 
^ 'reflection;' In ! this'case, the presence or absence of the boundary itself is the error. 
- ■■' Fi : gUre ^3 is a representation of "the perceptual stage, which measures the 
subjective significance of any 'errors present '"in the image sequence. The original 
signal 16 : ahd the degraded signal' 1 6d/ each filtered and masked as described with 
30 ij reference to Figure T, ; are first each analysed {either in parallel or sequentially) in a 
component extraction process 31 to identify characteristics of the edges or 
boundai:es : of the * principal components of each image. These characteristics are 
supplied as inputs 32, 32d to a comparison process 33 which generates an output 38 
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indicative of the overall perceptual degradation of the degraded image with respect to 
. the original image. . , 

The components identified by the extraction process 31 may be distinguished 
by: , . . . 

5 , • Luminance (illustrated in Figure 4) and Colour 

• Strong Edges (illustrated in Figure 5) t 

• Closure Effects (illustrated in Figure 6),-. 

• Texture (illustrated in Figure 7) .. . 

• Movement . 

10 • Binocular (Stereoscopic) Disparities. . , , 

These last two effects : rely on phenomena relating :n to movement ..and 
stereoscopy, not readily illustrated on the .printed page- For similar reasons, only 
luminance differences, and not colour differences,, are illustrated in Figure 4. 

Figures 4 to 7 . all depict a circle and a square, the square, obscuring part of 
15 the circle;. In each case, the boundary between. the two elements, is readily perceived, ■ 
although, the .two elements are represented in different ways. ln. ! Ri i g.ure..4, the circle 
, and; square. have, different luminance . tljie- circle .is. black and the square i,s , white. A 
. boundary Js perceive©!, at the locations, ; wherg.. this. property changes ! f lt: : wilUbe /noted 
, . . that in Figures 5, 6. and ,7. there are . also Jobations where the luminance changes, (for 
20 example . the. boundaries between each individual jstripe in Figure 7 but these- are not ■ 
t perceived as the principal boundaries .of.the^image. -, jr --- . , r - i r 

^ .Figure 5 illustrates a boundary, . defined, by ; an edge. A. "strong edge", or 
outline, is a narrow Jinear feature, , of a colour or luminance ^contrasting with, the 
regions on either side .of it. The, viewer perceives this linear feature not- primarily as a 
25 component in its own right, but as a .boundary separating the .components? either, side 
. of it. In analysis, of the, image, such, an edge can be identified by, 7 a- localised high- 
frequency element, in the filtered signal. Suitable processes identifying- .edges have 
. been developed, for example the edge extraction process described, t by, S M Smith and 
J . M t Brady in "SUSAN - A new approach to low-level image processing" (Technical 
30 Report TR95SMS1c, Oxford Centre for Functional magnetic Resonance imaging of 
the Brain , 1995). , .. , . . ... 

In many circumstances a viewer.can perceive an edge where no continuous 
line is present. An example is shown in .Figure 6, where the lines are discontinuous. 
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The human perceptual system carries out a process known as "closure", which tends 
to complete such partial edges. (A further example is illustrated by the fact that'none 
: of Figures 4 to 7 actually depict a full circle. The viewer infers the presence of a 
circle from the four lenticular regions actually depicted in each Figure). Various 
5 processes have been developed to emulate the closure process carried out by the 
human perceptual system. One such process is described by Kass M., Witkin A., and 
Terzopoulos D. r "Snakes: Active Boundary Models", published in the Proceedings of 
First International Conference on Computer Vision 1987, pages 259-269. 

"Texture" can be identified in many regions in which the properties already 
10 mentioned are not constant. For example, in* a region occupied by parallel lines, of a 
colour or Tuminance r contrasting with the background, the individual iocation of each 
line is hot of 'great perceptual 'significance: However, if the lines have different 
orientations in different parts of the region, an observer will perceive a boundary 
where the orientation changes. This property is found for instance in the orientation 
15 of brushstrokes in paintings. An example is shown in Figure 7, in Which the circle and 
''square are defrried by two orthogonal series of parallel bars. Note that if' the imagers 
enlarged such if that the ' angular separation of the' stripes' is closer to 1 the peak value 
0: "sihdwn ; ^Figure 1 and the dimeh"si6ns ir b1 v Vhe / ¥quWe : ' a'hcf circVfe further from that peak 
valueV the individual stripes' would be&bme the domihant features/' instead of the 
20 square and' circle. It will also be apparent that if ' the ! orientatidns f of r the 0 bars ; were 
different, the boundary between the 'square and the circle' may become iess distinct. 
To identify the texture* content of a'region bf the' image, the energy content in each 
r channer6utput v: from the Gabor filters' 1 3 is used; Each ch'anhei Represents' a given 
! - ''spatial frequency- and orientation. ; By identifying regions wher6 a given channel or 
25" channels haVe High energy content, regions of similar texture can* be identified. 

Shapes can 'be^ discerned by 'the human' perceptual system in other ways, not 
r " illustrated in- the accompanying drawings. In ''particular, dispanties between related 
images, such as ? the pairs of image frames^used in stereoscopy, or successive image 
frames iri'a motion- picture, may K identify image elements not apparent on inspection 
30 of a single framV: 1 For 'example, if two otherwise simiiar images, with no discernible 
structure in either individual image, include a region displaced in one image in relation 
to its positiorv in- the other, the boundaries of that region can be discerned if the two 
images' *are viewed simultaneously, : orie by 1 each eye. Similarly, if a region of 
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apparently random pixels moves coherently across another such region in a moving 
image, that region will be discernible to an observer, even though, no shape would be 
discernible in an individual frame taken from the sequence. This . phenomenon is 
observable in the natural world - there are. many creatures such : as flatfish, which 
5 have colouring similar to their environment, and which are only noticeable when they 
move. 

The component extraction process identifies the boundaries of the principal 
elements of both the original and, degraded signals. -The perceptual importance of 
each boundary depends on a numbe/.,pf factors, such, as its nature, {edge, colour 

10 change, texture, etc), the degree of contrast involved, and its context. In- this latter 
category, a high frequency component ,to the filtered and. masked; signal will signify 
that there are a large number of individual edges, present in that region of , the image. 
This will reduce the significance of : each individual edge - compare Figure 5, which 
has few such edges, with Figure 7, which has many more such^ edges. 

15 _ Each individual extraction process carried out in the component splitting. step : . 
, on its P wn ' typically performs relatively poorly, ( as they all tena\.to create false 
boundaries, and fail to detect others. Ho weve/,. ;the combination , of, i: dif|enent 
processes .increases ,the. quality pi the ce§ujt, r .a.-yisual object beLng,,pften/ defined by 
many perceptual boundaries, as discussed ,by . ScasseJJati - ; , B.M . in ../.'High-level 

20 perceptual contours _ from a variety of lowrleyel ppysicai ' features" .(Wlasxev Thesis, r 
Massachusetts Institute of Technology,. May .1 9951... For this reason , the comparison 
process 33 r corr|pares all the boundaries together, regardless of how, they :are- defined 
except . insofar as r this, , affects their^ perceptual significance, to/ produce ■ a . single 
aggregated, output ,38. ; ; . K .... 

25 The results . 32, 32d of the c.omponent analysis .31 . , ( are passed to., a 

comparison process, 33, in which the, component boundaries identified, in : each signal 
are compared. By comparing the perceptual relevance of all : ,;bou.ndary ; types in the 
image, a measure of the overall perceptual significance of degradation of a signal can 
be .determined, and provided as an ou r tput 38. The perceptual significance of errors in 

30 a degraded signal depends on the context in which :r ,they occur..; Fp^ example, the loss 
or gain of a .diagonal line (edge) in Figure 7 would have little effect on the . viewer's 
perception of the. image, but .the same error, if applied to Figure^,, would have a 
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much greater significance. Similarly, random dark specks would have a much greater 
-■* effect oh the legibility of Figure 6 than they would on Figure 4. 

' ^ ' ' J ln' more detail, the comparison process 33 consists of a number of individual 
•elements." The first element identifies the closest match between the arrangements of 
5 '-the boundaries in the two images (34), and uses this to effect a bulk translation of 
one image with respect to the other (35) so that these boundaries correspond. 

- The next 4 process 36 identifies features to which the human cognitive system 
is most sensitive, and weighting "factors : W are generated for such features. For 
example, it is possible to weight the cognitive relevance of perceptually critical image 
10*' elements sucrTas' 'those responsible for Visual speech cues, as it is known that certain 

■ facial features are- principally responsible fdf visual speech cues. See for example 
Rbsenblum, L'.'D., & Saldana; H.M>(1996A "An audiovisual test of kinematic 
primitives for vis'uaf 'spe^^ of Experimental Psychology: Human 
Perception and Performance, vol 22, pages 318-331) and Jordan, T.Rl & Thomas, 

1 5 '*"S-.Mv : (1998):" '^'Anatomicafiy guided construction of point-light facial images". 
— (Tech'nical report:' Human Perception and Communication Research Group, University 

■ 6^ Nottingham; Nottingham, U.K). - ; i '"■ ''' ' 1 ^ 

; a bsni rstyy/e^csnOrtif er that a face is h preseftt using patterrv recognition or by virtu'e of 

■ the feature of the service delivering thtj-ima^e. :>; ' 

20 :i ^ i 1 -The perc-eptuaP'sighificahce :, of each DoundaryMn onelmage is then compared 
• with ; - : the 1 corresponding boundary - (if ' any) in the other (37)/ and" an output 38 
generated according to the decree" of difference* in such perceptual significance and 
^the' weightings W previously determined; It should be noted ; that differences in how 
the boundary is defined (hard edge, colour difference, etc) do not necessarily affect 
25 -the perceptual Significance of the boundary, so all the boundaries,' however defined, 
^^re rdrhpared'' together. Moreover, since trie presence of a spurious boundary can be 
as perceptually significant as the absence of a real one, it is the absolute difference in 
perceptibility that is determined. : ' - 

Note that' degradation of the signal may have caused a boundary defined by, 
30 for example^ah edge/ to disappear, but the boundary may still be discernible because 
of some "other difference such as colour, luminance or texture. The error image 
produced by established models- (filtered and masked noise) provides an indication of 
the visible degradation of the image. The comparison process 37 includes a measure 
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of the extent to wnich the essential content is maintained and offers an improved 
measure of the image intelligibility. In comparing the boundaries (step 37), the 
perceptual significance of a given boundary may depend on its nature. A boundary 
between different textures may be less well defined than one defined by an edge, and 

5 such reduced boundary perceptibility is taken into account in generating the output. 

This process is suitable for great range of video quality assessment 
applications, where identification and comparison of the. -perceptual boundaries is 
necessary. A good example is given by very low bandwidth systems where a face is 
algorithmically reconstructed. This would be impossible for many of : the previously 

O known visual models to assess appropriately. The comparison. . of perceptual 
boundaries also enables the assessment of synthetic representations of images, such 
as an animated talking face, in which the features of the image that facilitate 
subsequent cognitive interpretation as a face are of prime importance 
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'• ' CLAIMS 

■ 1'. ' J A method of measuring the differences between a first video signal (16) and a 

i; second video signal (16d), comprising the steps of: 
5 analysing (31) the information content of each video signal to identify the 

perceptually 'relevant boundaries of the video images depicted therein; 

comparing (33) the boundaries so defined in the first signal with those in the 
second sicjriaf; the comparison including determination of the extent to which the 
properties of the boundaries defined in the original image are preserved, and 
10 ' J ' generating ah output (38) indicative of the perceptual difference between the 
first and second signals i : " J ' 

. t. . i . > ■ . f < 1 i - " ■ : . . - '■ 

2. A method according to Claim 1 in which the information content is analysed 

for a plurality of boundary-identifying characteristics (32, 3 2d), and the properties of 
1 5 the boundaries on which the comparison (37) is based include the characteristics by 
which such boundaries are defined in each of the signals. 

3 A method according to claim 2, wherein the characteristics include the 
presence of edges. 

20 

4 A method according to claim 2 or 3 r wherein the characteristics include the 
presence of disparities between frames 

5 A method according to claim 2, 3 or 4, wherein the characteristics include 
25 changes in at least one of the properties of: luminance, colour or texture. 

6 A method according to any of claims 1 to 5, in which the comparison 
includes a comparison (36) of the perceptibility of corresponding boundaries identified 
in the first and second signals. 

30 

7 A method according to any preceding claim, in which the comparison of the 
images includes the steps of 

identification (34) of the principal elements in each image, and 
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compensation (35) for differences in the relative positions of the said 
principal elements. 

8 A method according to any preceding claim, in which the analysis includes 

5 identification of perceptually significant image features, and the output (38) indicative 
of the perceptual difference between the first and second signals is weighted 
according to the perceptual significance of such image features. 

9. A method according to claim 8,. in which the perceptually significant image 
10 features are those characteristic of the human face. 

10. .A method according to claim 9, in which a. weighting is applied to the output 
according to the significance of the feature in providing visual cues to speech. 

15 11 A method according to claim 8, in which the perceptually significant image 
features are those by which individual text characters are distinguished. >( 

12 , Apparatus for pleasuring the differences .bet ween a. first vijdeo signal (16) , and 

a second video signal (1 6d), comprising: „, 4 
20 analysis means (31) for the information content of , eaph video signal to . 

identify the perceptually relevant boundaries of the video images depicted therein; 

n comparison means (33) for comparing the boundaries, so.. defined ,in the first 

signal (16) with those in the second signal (16d);. the comparison, including 

determination of the extent to which the properties of the boundaries defined, in ^t he 
25 original image are preserved, „ . r . . , ...... 

and means for generating an output (38) indicative of the perceptual 

difference between the first and second signals (16, 16d). 

.13. Apparatus according to Claim 12, wherein ;the. analysis means (31) is 
30 arranged to analyse the information content in the signals, (.1 6, 1 6d.) for a plurality of, 
boundary-identifying characteristics (32, 32d), and the comparispn means (33) is 
arranged to compare the characteristics by which such boundaries are defined in each 
of the signals. 
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14 Apparatus according to claim 13, wherein the analysis means {3D includes 
means to identify the presence of edges. 

5 : ' 15'. Apparatus according to claim 13 or 14, wherein the analysis means (33) 
includes means to identify the presence of disparities between frames 

16. Apparatus according to claim 13, 14 or 15, wherein the analysis means (33) 
' K; includes means to identify differences in at least one of the properties of: luminance, 
10 colour or texture. 1 . ■■ . . 

17 Apparatus according; to any of claims 12 to 16, in which the comparison 
means (33) includes means : (36) for deterrhining the perceptibility of the boundaries 
identified in the first and second signals. 

1 5 ■ ' '" ; * "-^ v • ■ ' nr " " ~ ■ ' 

18 Apparatus according to any of claims 12 'to 17, in which the comparison 

means (33) includes image matching means (34) for identification of the principal , l 

1 , 'elemenfs !r in e'acn " image and translation* f rrieahs (35) 'for effecting translation fbf one 
image (16d) to compensate for differences in the relative 'positions of such elements 
20 in the' first and second images; v ■ • ^ 

? 19. Apparatus 1 according to any of claims i 2 to '18, in which the' comparison 
• ,:! tnea'hs (33) includes weighting means 36 for identifying 'perceptually significant image 
features in thd'corriponents (32, 32d), and weighting the output (38) acc6fdmg : to the 
25 perceptual significance of such image features. .'"* * 

20. Apparatus according to any of claims 12 to 19, further comprising visual 
stage means (11 ;1 2,1 3;14,1 5) for processing original input signals (11) to emulate 
30 the response of the human visual system, to generate modified input signals (16, 
16d) for input to the analysis means (31). 
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